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Abstract. In this paper we present a number of metrics for usage of the 
SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the 
entire astronomical community, these are indicative of how the astronomical 
literature is used. We will show how the use of the ADS has changed both 
quantitatively and qualitatively. We will also show that different types of users 
access the system in different ways. Finally, we show how use of the ADS has 
evolved over the years in various regions of the world. 
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1. Introduction 

The SAO/NASA Astrophysics Data System (hereafter ADS), is a digital li- 
brary and a vital source for bibliographic information in astronomy. The vast 
majority of astronomical researchers in the world use the ADS on a daily or 
near-daily basis. The use of the ADS has not only changed quantitatively but 
also qualitatively. Initially almost exclusively used by professional astronomers, 
the ADS now also has become a public service through external, general search 
engines (like Google, Yahoo, M icrosoft Live Search and Ask.com, to name a 
few). In Henneken et al. ( 20071 ) we observed that up to the middle of 2004, the 



number of ADS users doubled on a bi-yearly basis. Since the ADS started to be 
indexed by general search engines, the number of incidental users has increased 
dramatically. However, the number of typical users (more than 10 visits per 
month) has continued to follow the same growth pattern. 

With different types of users come different types of use. A professional 
astronomer has different interests than an occasional user. One way of illus- 
trating this is to look at the distribution of publication years for the literature 
people are interested in. We will also look at the diversity of ADS users from a 
geographical point of view. This will indicate whether increased Internet access 
actually results in an increase of ADS usage. This is particularly i nteresting with 
respect to aspects of the "Digital Divide" (see e.g. liTU I ( 20071 )). In the next 
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section, we will describe the character of the data we are working with. The 
following section will show the results, which will then be discussed in section 4. 



2. Data 

The ADS is an electronic library where the system log files record queries and 
access to its records over time. For every bibliographic record in the ADS, a 
user can choose to view or access various types of metadata associated with 
that record. A "visit" (or "read") is defined as the selection of a metadata 
link. The ADS's data log entries record what type of data item was selected 
for which article. For example, in the period of January and February of 2008, 
72% of all requests were for an abstract, 19% for the full article text, 4% for 
citation histories and 3% for e-prints from arXiv. Our data log entries show 
where a visitor came from (for example, whether a visitor used the ADS directly 
or came in via an external source) and what he/she read. When a user requests 
more information than just an abstract and/or performs a query, a cookie is 
assigned to that user. This allows us to compile frequency statistics for these 
users and determine the group of "typical users". 

These ingredients provide us with the information we need to analyze the 
behavior of different types of users. For our analysis we use the ADS log files 
of January and February of 2008. These log files represent over 5.9 million 
requests from 317,753 unique users with a cookie and 1,071,416 unique users 
without a cookie, defined as the number of unique IP addresses associated to 
queries without a cookie. In the next section, besides looking purely at usage 
patterns, we will also compare usage to Gross Domestic Product (GDP) and 
Intern et usage. We used GDP data from the World Economic Outlook database 
MMF N 2008)). Internet usage was obtained from the EarthTrends database 
dwRI I (12008 1). 



3. Results 

3.1. General readership 

Figured] (top) illustrates the observation we made in the introduction: "Since the 
ADS started to be indexed by general search engines, the number of incidental 
users has increased dramatically". The line marked '+' shows the total number 
of users. This includes incidental users who just look at an abstract. Excluding 
incidental users, the total number of users is shown by the line marked with 
'x' (these users request additional metadata, besides abstracts, and perform 
queries). Finally, the number of users who use the ADS regularly (10 or more 
times per month) is given by the line marked with ' The bottom panel is 
an illustration of the qualitative change in use. It shows the distribution of the 
fraction of users as a function of the number of reads in the month of January 
in the years 2000, 2002, 2004, 2006 and 2008. 

Table [1] gives an overview of the number of visitors entering through one of 
those external websites, for the period of January and February of 2008. The 
total number of visits in this period was 5,941,983. 
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Figure 1. Top: Number of ADS users over time. The line marked '+' shows 
the total number of users. This includes incidental users who just look at an 
abstract. Excluding incidental users, the total number of users is shown by 
the line marked with 'x' (these users request additional metadata, besides 
abstracts, and perform queries). Finally, the number of users who use the 
ADS regularly (10 or more times per month) is given by the line marked with 
'*'. Bottom: user fraction ranked by the number of reads for the month of 
January in 2000, 2002, 2004, 2006 and 2008. 



3.2. Readership for different types of readers 

For the first part of our analysis (figures [2] and [3]) we have limited ourselves to 
data requests for the following journals: The Astrophysical Journal (including 
Letters, but excluding the Supplement), The Astronomical Journal, The Monthly 
Notices of the Royal Astronomical Society and Astronomy & Astrophysics. These 
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ADS Usage Through External Sites 


Site 


Number of visits 


Google 


1,920,797 


Google Scholar 


315,864 


Wikipedia 


31,094 


Astr. Pict of the Day 


23,742 


arXiv 


17,474 


MSN 


11,556 


Yahoo 


2,015 


Ask.com 


1,354 



Table 1. Use of the ADS in January and Feburary, 2008. The total number 
of visits in this period was 5,941,983 



journals constitute the core of research publications in astronomy, which all 
active astronomers read on a regular basis. First we determined the total use 
during these two months and the total number of different articles for which 
data was requested. Since the number of articles published is a function of time, 
it makes more sense to scale the totals with the number of papers published in a 
given year. This gives the mean usage for each publication year and the fraction 
of published papers for which information was requested. 
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Figure 2. Article use through the ADS as a function of publication year 



These numbers are shown in figure [2j The line marked with '+' shows the 
mean usage per paper, and the line marked with 'x' represents the fraction of 
different ar ticles for wh i ch da ta was requested. This figure is very similar to 
figure 12 in lKurtz et aD (|2000l ^. 
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Figure 3. Comparison of readership patterns from ADS and Google Scholar 
queries, as observed in ADS's access logs. The line marked with '+' shows 
the readership use by people using the ADS search engine. The line marked 
with '□' corresponds with the readership use by people who used the Google 
Scholar engine. The line marked with 'x' shows the citation rate to the arti- 
cles, while the line marked with respresent their total number of citations. 



What picture do we get when we zoom in on different types of users? In 
particular, we will look at users who we will call "ADS regulars" (mostly as- 
tronomers and physicists) and people requesting information through Google 
Scholar. The group of "ADS regulars" consists of people who use the system 
more than 10 times per month. Figure [3] shows the mean usage for these types 
of users, as function of publication year. The line marked with '+' shows the 
readership use by people using the ADS search engine. The line marked with 
'□' corresponds with the readership use by people who used the Google Scholar 
engine. The line marked with 'x' shows the citation rate to the articles, while 
the line marked with ' +' respresent their total number of citations. In other 
words, this f igure compa. r es the so-called "obsolescence functions" for cites and 
use (see e.g. iKurtz et al.l ( 20031 )) for articles published in the four main astron- 



omy journals as read by these two types of users. 

An interesting metric for the "ADS Regulars" is the median of monthly 
usage. In the period of January 1998 through January 2008 the median for the 
monthly number of reads (by these users) turns out to be fairly constant at a 
value of 21ibl reads per month. 

3.3. Readership for different geographical regions 

For the second part of our analysis we zoom in on usage data, to see how reader- 
ship varies per geographical region. In the previous section, we mentioned that 
our data logs also record the origin of requests. This allows us to determine 
use as a function of geographical region. Since science and technology depend 
heavily on budgets, it is particularly interesting to look at the readership in a 
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particular region as a function of GDP per capita (GPC), especially for devel- 
oping countries. Figure H] shows the results for eight regions of various economic 
strength. Each data point corresponds with one year. 

In the top panel, we explore readership as a fraction of total readership 
in a given year. This fraction will tell us to what extent region usage growth 
follows the growth trend on world level. We will refer to the set of data points 
for each region as a "trail" . A trail maps out the relation between GPC and the 
fraction of world usage over time. The trails shown in this panel can be classified 
into three groups. The trails for the EU and the USA evolve from the left to 
the right and slightly down, as time progresses. The trail for South America 
initially moves up (from the smallest fraction of world usage) and to the left, as 
the region moves into a recession, and then to the right, with a close to constant 
fraction of world usage. The trails for the remaining regions show an evolution 
of an increase in both GPC and the fraction of world usage. 

The middle panel looks at pure growth within the region as a function of 
GPC. In this figure usage has been normalized by the 1997 level, so the numbers 
show a relative growth with respect to 1997. Normalized, this plot will show 
similarities in intrinsic growth. The general flow in this diagram is up and to 
the right, as time progresses. The most pronounced exception is South America, 
moving into its deep recession. Essentially there seem to be two classes of trails, 
one formed by the EU and the USA and the other by the remaining regions. 

The lower panel compares the number of ADS users in a region with the 
number of Internet users, both normalized with their 1997 values. The flow of 
time here is to the right and up. Points above the solid line indicate that in 
a particular region the number of ADS users grows faster than the number of 
Internet users. 

In order to get the data used in figure H] the following operations were 
performed on the origin information in our data logs: (a) requests originating 
from a ".com" or ".net" domain were assigned to the country of the referer 
URL, and (b) we set a limit of 2000 reads per year to any user (thus filtering out 
atypical usage). The region of "Least Developed Countries" consists of the 49 
countries (as of the writing of this paper), as defined by the UN. The appendix 
includes the list of countries in this category. 



4. Discussion 



Figure [T] shows how the ADS has become a public service whose reach goes well 
beyond the scholarly community. Since 2005, the number of people visiting the 
ADS via external sites has increased dramaticall y. This aspect will only intensify 
with the advent of the World Wide Telescop e ( Gray et aTl ( 20021 )). which has 
links to the ADS. Google Sky (|Scranton et al.l (|2007l ')^ already contains context- 
sensitive menus that enable positional searches for papers in ADS. Incidental 
users have contributed the most to the strong increase that started in 2005. We 
have observed that a fraction of these people actually started using the ADS 
on a semi-regular basis. The lower panel of figure [1] illustrates the qualitative 
change in ADS usage. In January of 2006 and 2008, the distribution of reads over 
users (shown as fraction of the total number of users) is different from the other 
years shown. It shows that the ratio of frequent to infrequent users has changed 
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Figure 4. Top panel: Fraction of world usage as a function of GDP per 
capita (GPC) for 8 different regions in the time period of 1997 through 2007. 
Middle panel: Region ADS usage as a function of GPC. Quantities have been 
normalized by their value in the year 1997. Lower panel: Amount of ADS 
users in a region as a function of the number of Internet users for that region. 
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considerably. Table [T] shows that in the period of January and Febraury of 2008, 
almost 2 million visitors came to the ADS through Google, which seems to have 
become the trend. On average, these people get 2.2 abstracts of research grade 
papers each. We think that this has an impact on the science education of the 
general public, which may be compared to some general circulation magazines. 
Quantifying this would need further research. 

Figure [2] reports on popularity of papers as a function of publication date. 
What papers (in terms of their age) get the most attention from people who 
use the ADS regularly? This figure shows that more recent papers get the most 
attention. This is not just because they are read more. The fact that the ADS 
reads increase more rapidly, for more recent papers, than the fraction of unique 
papers being read, shows that the fractio n of interesting pap ers is larger in more 
recent papers. This was also observed in lKurtz et al.l ( 2000l ). 

Figure [3] can be read in various ways. One interpretation is that use ob- 
solescence can be substantially different for different types of users, even when 
accessing the same documents. How a search engine works, can have significant 
effects. Google Scholar use approximately matches the total number of cita- 
tions, which is similar to the reading habits of students. Of course, ADS can 
also provide results similar to Google Scholar by requesting that articles ADS 
returns be sorted by citation count. There is an underlying reason for the strong 
correlation between the total number of citations and the readership patterns 
through Google Scholar. This reason is the correlation betwee n the PageRank 
and the total number of citations (see e.g. Chen et al. ( 20071 1). If we classify 
papers on the basis of their references and citations and average the PageRank 
over these classes, it turns out that the average PageRank is proportional to 
the total number of citations and is independent of the number of references 
fsee iFortunato et al.l (|2006l ')l. A consequence of the results shown in figure O is 
that ADS provides what researchers want, while Google Scholar does not. 

Another interpretation of figure [3] focuses on obsolescence of use and cites. 
A fundamental difference between cites and use is that the former is a public act, 
while the latter is a private act. Citations are created by authors of scholarly 
articles, while use, in general, is not solely the result of actions by authors. In 
other words, authors are often users, but there is a large set of users who are 
not authors. This contributes to the complexity of the relation between use 
an d cites. The r eader can find a detailed mathematical ana lysis of obsolescenc e 



m 



Egghe et al.l ( 2000l ) and a phenomenological analysis in iKurtz et al.l ( 20031 ). 



where use is modeled as consisting of 4 modes, each w ith its own characteristic 
time scale. Just like the results in iKurtz et al.l ( 20031 ). the results in this paper 
represent the mean current use per article published as a function of article age. 
In this way we directly measure the intrinsic decay. It is interesting to observe 
the close correlation between use and cites in the mean. It falls outs i de the scope 
of this paper to model our observations. We refer to iKurtz et al.l ( 20031 ) for a 
detailed discussion of obsolescence of cites and use. Actual citation distributions 
among papers classified according to popularity will need further research and 
will be the subject of future publications. 

What does it mean to have a median that is fairly constant at a value 
of about 21 reads per month? It is an indication that all our frequent users 
on average use the ADS on a daily basis. Initially this meant that all profes- 
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sional astronomers u se the ADS on a daily basis (see Kurtz et al. ( 20051 ) and 
Tenopir et al. ( 20051 )). This is probably still true, but with the addition of a 



growing group of physicis ts. The number of 21ifcl reads per month is in agree- 
ment with the findings in lKurtz et al.l ( 20051 ) (where a median of 22 reads per 
month was found for the month of August in 2001). 

Figure H] tells many stories. In the top and middle diagrams, each region 
has 11 datapoints, corresponding to one year (1997 through 2007). The lower 
diagram has 9 points for each region (1997 through 2005). South Africa (ZA) 
was omitted from the region "Africa", because it would completely dominate 
the analysis for this region (the fraction of world usage for South Africa alone is 
more than for the rest of Africa) . 

In the top panel of figure S] it is immediately clear that the EU and the 
USA are different from the other regions. Obviously, for both, the GPC will be 
higher than the other regions. Since the lion's share of astronomy research is 
performed in either the USA or the EU, it is also no surprise that their fraction 
of world usage is substantially higher than for the other regions. Significant is 
the difference in how the fraction of world usage changes over time. For both the 
EU and the USA, this fraction gradually decreases over time. The explanation 
for this is that these two regions represent the bulk of ADS usage (ranging from 
84% of world usage in 1997 to 70% in 2007). ADS usage comes from existing 
Internet users, because in these regions, the number of Internet users is more 
or less saturated. So, even though there is a steady increase in the number of 
users in these regions, the fraction decreases because the overall usage increases 
faster. China displays a similar trend in the recent past, following a period of 
fast growth (until 2004). The character of the Chinese economy has changed: 
initially we see growth typical for a low-income region, but probably around 
2004, China moved into being a middle-income region. This seems to be the 
case if we take the number of Internet users as an indicator for economic growth. 
The number of Internet users in middle- and high-income regions saturates over 
time (see below). The data points for South America show that there is an 
increase in ADS usage, even when the economy for that region goes through 
heavy recession. Around 2002 the fraction settles on a value of about 3%. 

The middle panel of figure U] shows that all regions display an intrinsic 
growth (with respect to their 1997 values). The overall trend is that the intrin- 
sic growth in the EU and USA is slower than in the other regions, which is to be 
expected for regions where the density of Internet users changes relatively little 
(because of the wide initial penetration of Internet connectivity). Figure O illus- 
trates this fact. The definitions of high-, middle- and low-income countries are 
those of the Worldbank: high-income countries are those countries with a Gross 
National Income per capita (GNC) in 2007 of $11,456 or more, while middle- 
income countries are those countries with a GNC between $936 and $11,455 and 
low-income countries are countries with a GNC of less than $935. This figure 
shows that, using Internet usage as indicator, China changed from being a low- 
income country to a middle-income country. Low-income countries do not show 
a clear flattening trend in the Internet users density, which is clearly present 
for middle- and high-income countries. This is probably due to the fact that 
low-income countries still have a lot of potential for growth. China becoming a 
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middle-income country is probably the reason for the decrease in the fraction of 
world-usage, seen in figure H] (top). 

After the Internet user density flattens, the ADS user density still increases, 
because there is still substantial potential for use diversification within the ex- 
isting body of users. This is clearly shown in the lower panel of figured! most 
prominently for the USA. The number of ADS users increases rapidly, while the 
number of Internet users does not. Once everybody is online, the only thing that 
changes is browsing behavior. The EU shows a similar trend, with a delay. This 
is probably due to a longer penetration time of the Internet in various member 
states of the EU. In general, points above the solid line in the lower panel of 
figure m indicate a growth of the number of ADS users that is faster than the 
growth of Internet users in that region. 



5. Conclusions 

In terms of its audience, the ADS has not only changed quantitatively, but also 
qualitatively. Besides a steady growth of the ADS regular users, we observe a 
dramatic increase in incidental users. The ADS is per definition the gateway to 
online literature for scientists, used by virtually all professional astronomers on 
a daily basis. Since 2005 there is a growing role as a source of science education 
of the general public. 

Comparing the group of "ADS regulars" with the group visiting the ADS 
via Google Scholar shows that the obsolescence curve for the latter is fairly flat, 
corresponding with reading behavior by people acquainting themselves with a 
subject. This means Google Scholar is not the right tool for staying up-to-date 
with the latest events in a field. Looking at how professional astronomers use the 
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ADS shows that the obsolescence function for them closely follows the citation 
rate (as a function of paper age). 

Although ADS usage increased in regions like the EU and the USA, the 
percentage of world usage has decreased for these regions. This is because the 
growth in World usage is mainly driven by regions with the biggest potential 
for growth. The density of Internet users reaches a saturation point in middle- 
and high-income regions at which point ADS usage increases at a slower rate. 
It is encouraging to see the rapid increase in Internet user density in low-income 
regions and a similar increase in the number of ADS users in those regions. It 
indicates that increased access to electronic information is being used and in this 
sense there is a narrowing of the "Digital Divide" for these regions. Whether 
this increased access also resulted in an increased scientific output needs further 
bibliometric research. 
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6. Appendix 

A country is classifie d as a "Lea s t Deyeloped Country" when it meets the fol- 
lowing three criteria (|UN I (|2007l '). lUN I (120081 ^): 



• a low-income criterion, based on a three-year average estimate of the gross 
national income (GNI) per capita (under $745 to be included in the list, 
above $900 to be removed from the list); 

• a human capital status criterion, involving a composite Human Assets 
Index (HAI) based on indicators of: (a) nutrition: percentage of population 
undernourished; (b) health: mortality rate for children aged five years or 
under; (c) education: the gross secondary school enrolment ratio; and (d) 
adult literacy rate; and 

• an economic vulnerability criterion, involving a composite Economic Vul- 
nerability Index (EVI) based on indicators of: (a) population size; (b) 
remoteness; (c) merchandise export concentration; (d) share of agricul- 
ture, forestry and fisheries in gross domestic product; (e) homelessness 
owing to natural disasters; (f) instability of agricultural production; and 
(g) instability of exports of goods and services. 

To be added to the list, a country must satisfy all three criteria. In addition, 
since the fundamental meaning of the LDC category, i.e. the recognition of 
structural handicaps, excludes large economies, the population must not exceed 
75 million. To become eligible for graduation, a country must reach threshold 
levels for graduation for at least two of the aforementioned three criteria, or its 
GNI per capita must exceed at least twice the threshold level, and the likelihood 
that the level of GNI per capita is sustainable must be deemed high. 
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Africa (33) 

Angola 
Benin 

Burkina Faso 
Burundi 

Central African Republic 
Chad 

Comoros 

Dem. Rep. of the Congo 
Djibouti 

Equatorial Guinea 

Eritrea 

Asia (15) 

Afghanistan 

Bangladesh 

Bhutan 

Cambodia 

Kiribati 

Latin America and the 

Haiti 



Ethiopia 

Gambia 

Guinea 

Guinea-Bissau 

Lesotho 

Liberia 

Madagascar 

Malawi 

Mali 

Mauritania 
Mozambique 

Lao People's Dem. Rep. 

Maldives 

Myanmar 

Nepal 

Samoa 

Caribbean (1) 



Niger 
Rwanda 

Sao Tome and Principe 

Senegal 

Sierra Leone 

Somalia 

Sudan 

Togo 

Uganda 

United Rep. of Tanzania 
Zambia 

Solomon Islands 

Timor-Leste 

Tuvalu 

Vanuatu 

Yemen 



Table 2. Least Developed Countries (UN definition) 



